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'Abstract 



■"sj" Xong length-scale structural deformations of DNA play a central role in many biological processes including gene expres- 
sion. The elastic rod model, which uses a continuum approximation, has emerged as a viable tool to model deformations 
pf DNA molecules. The elastic rod model predictions arc however very sensitive to the constitutive law (material propcr- 
Cl^tics) of the molecule, which in turn, vary along the molecules length according to its base-pair sequence. Identification of 



the nonlinear sequence-dependent constitutive law from experimental data and feasible molecular dynamics simulations 
remains a significant challenge. In this paper, wc develop techniques to use elastic rod model equations in combination 
^ .with limited experimental measurements or high-fidelity molecular dynamics simulation data to estimate the nonlinear 
Q constitutive law governing DNA molecules. We first cast the elastic rod model equations in state-space form and express 
^ ,the effect of the unknown constitutive law as an unknown input to the system. We then develop a two-step technique 
^ to estimate the unknown constitutive law. We discuss various generalizations and investigate the robustness of this 
' ^ .technique through simulations. 
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\ Designing and engineering DNA molecules to achieve 
desired biological activity has numerous applications that 
can lead to advances in disease prevention, diagnosis, and 
cure, and is an active area of research. For instance, re- 
combinant DNA technology and gene therapy, which rely 
■heavily on engineered DNA, are revolutionizing the way we 
[treat genetic diseases such as severe combined immunode- 
ficiency (SCID), cystic fibrosis, hemophilia, muscular dys- 
trophy and sickle cell anemia. In addition, designing DNA 
■molecules to achieve desired biological activity is poten- 
jtially useful in other genetic engineering applications such 
as engineering algae to produce and synthesize biofuels. 

The biological activities of DNA including gene expres- 
sion are significantly infiucnccd by its long l eng th-scale 



structural deformations such as looping |28l . |32| . which 
in turn is tied to its chemical make-up, the base-pair se- 
quence. For example, as shown in Figure [U activity of the 
genes in lac-operon in the bacterium E.coli is governed 
by the sequence-dependent looping behavior of its non- 
coding DNA segment adjacent to the genes [l^l- Thus 
looping acts as a biological switch to turn on or off the 
gene expression by restricting access to the transcription 



initiation site on DNA. In fact, this example has become 
a paradigm in understanding looping as a common gene- 
regulatory mechanism. 

The success of designed DNA molecules for disease pre- 
vention or genetic engineering applications therefore de- 
pends not only on the genetic information contained in 
the DNA but also on the structural deformations of non- 
coding "junk DNA." Hence there is a need to understand 
and model the structural deformation of the non-coding 
DNA segments in order to design DNA molecules that 
not only have the right genetic information but also un- 
dergo desired structural deformations necessary to acti- 
vate biological mechanism. This is demonstrated by the 
fact that various designed sequences of non-coding DNA 
in lac-operon shown in Figure [1] affect the gene expression 
level Hfllii. 
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Moreover, non-coding DNA is a significantly larger part 
of the genome than the coding part (more than 98 % in the 
human genome (Hj). The non-coding part is thus a major 
consideration for designing DNA and has been mostly ig- 
nored thus far. Therefore, understanding the biologically- 
relevant structural deformations of DNA molecules will 
greatly accelerate discovery in genetic-disease prevention, 
diagnosis, cure, and other genetic engineering applications. 
Both static and dynamic deformations of DNA play a sig- 
nificant role in its biological activity, thus understanding 
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and modeling these deformations and their relationship 
with base-pair sequences represents a significant challenge 

Among the several existing approaches to model struc- 
tural deformation in DNA molecules [H [H [H, [li, [H, i, 
EES, Sins Sin, elastic rod models are based on 
a continuum approximation, are computationally efficient, 
and are applicable to long length scales. In this approach, 
DNA molecules are viewed as continuous filaments. These 
models have the capability to efficiently represent large 
nonlinear structural deformations with arbitrary loading 
and even account for complex interactions ll|, |36| . The 



use of rod models is reasonably well-established in the lit- 
erature on DNA modeling as reviewed in [2^ and 23 1. 



Recent rod models have achieved some promising mile- 
stones in describing biologically-relevant deformations of 
DNA molecules 0[IS[I3- However, a key component 
of these elastic rod models is a constitutive law (material 
law), which follows from the bond stiffnesses and other 
atomistic-level interactions and can be approximated by 
Hooke's law. The constitutive law represents the relation- 
ship between the base-pair sequence and the bulk-level 
elastic properties of DNA molecules, and is largely un- 
known. A simplistic view of this constitutive law is that it 
represents the "springiness" of the DNA molecule and its 
relationship to the base-pair sequence. 

In macro-scale applications, it is often possible to de- 
rive the constitutive law from first-principles, or use exper- 
imental measurements to directly determine the constitu- 
tive law. However, with DNA molecules these approaches 
are impractical. Thus the alternative is to use limited ex- 
perimental measurements to estimate the constitutive law. 



^ Looped non-coding 
DNA 



Genes 




Lactose- 
Repressor Protein 



Figure 1: Schematic of Lac-Operon in bacterium E.coli. This 
example illustrates the role of structural deformation of the 
non-coding DNA in regulating genes. The activity of three 
genes LacZ, LacY and LacA is regulated by looping of adjacent 
non-coding DNA. The looping is mediated by binding with a 
V-shaped protein, the Lactose-Repressor. 

Traditionally, the constitutive law is represented by 
simple elasticity parameters that are tuned by trial and er- 



ror to match data from specific experiments. For example, 
average (non-sequence-dependent) bending and torsional 
stiffness is characterized through single-molecule experi- 
ments in [isl . 34 1 . Furthermore, efforts are underway to 



estimate linear sequence-dependent constitutive law using 
a massive and systematic collection of molecular dynamics 
simulations 0,0, 21, 21\- The time and effort involved in 
trial and error has limited researchers to consider overly 
simplified models wherein they fit just one or a few pa- 
rameters (33I I. However, there is no clear consensus on the 
constitutive law's functional form and how it maps from 
the base-pair sequence. Recent experimental observations 
0, Q indicate that linear constitutive laws are ineffective 
in capturiiig t he g eneral structural deformations of DNA 
molecules [35l |33| . The general form of constitutive law 
remains largely unknown, and the capability to determine 
constitutive law of DNA that governs its structure and dy- 
namics is severely limited by the absence of a systematic 
approach to best leverage the limited data available. 

In this paper, we develop techniques to use elastic rod 
model equations in combination with limited experimental 
measurements or high-fidelity molecular dynamics simula- 
tion data to estimate the nonlinear constitutive law gov- 
erning DNA molecules. Moreover, the techniques devel- 
oped in this paper are also directly applicable to other 
bio-filaments of medical interest such as collagen fibers, 
cilia, flagella, and microtubules. We first cast the elastic 
rod model equations in state-space form and express the 
effect of the unknown constitutive law as an unknown in- 
put to the system. We then develop a two-step technique, 
in which we first apply simultaneous input reconstruction 



and state estimation techniques [18|, |2J, |25| to estimate 
the unmeasured states and the unknown inputs, then use 
least-squares fitting to estimate the unknown constitutive 
law. We discuss various generalizations and investigate the 
robustness of this technique through simulations. 

2. Problem Formulation 

We consider experiments in which bio-filaments such as 
DNA molecules are clamped at one end and loading forces 
are applied to the other free end. We focus our attention 
on steady-state deformations and thus ignore transient re- 
sponse. For generality, our subsequent development will 
be based on bio-filaments. Specific detail relating to DNA 
molecules will be pointed out periodically. 

Let s be the spatial variable along the length of the bio- 
filament (rod). Let /(s) G M.^ be the three-dimensional net 
internal force (tensile and shear) vector, q{s) G M.^ be the 
net internal moment (bending and twisting) vector, and 
k{s) € be the curvature of the rod, respectively, at 
the location s along the rod. We express all the above- 
mentioned vectors with respect to a body-fixed reference 
frame, that is, the basis vectors are attached to each indi- 
vidual cross section of the bio- filament. 



See 



11 



for details 



on this reference frame. Furthermore, assuming no exter- 
nal field forces and moments other than boundary forces 



are applied on the bio-filaments, the elastic rod model de- 
scribing the motion of the bio-filament [111 then simplifies 
to the following two vector differential equations in terms 
of the spatial variable s 



2.21) in the state-space form as 



dq 

— + KXq = fxt, 
ds 

ds 

with the nonlinear constitutive law 

g{K{s),q{s)J{s),s) = 0, 



(2.1) 
(2.2) 

(2.3) 



where g : ^ R'^ is a vector function with three compo- 
nents and i is the tangent vector to the centerline of the 
rod. 

By setting the origin (s = 0) to be the free end and the 
clamped end to be s = L, we treat the above problem as an 
initial value problem by prescribing the loading conditions 
at the free end. Note that, in general the steady-state 
deformation of a distributed-parameter system with s as 
the independent variable is not a causal system and must 
be treated as a boundary- value problem and not an initial 
value problem. However, within a cantilever framework, 
by focussing on the curvature k(s), which is the second 
derivative of the deformation with respect to s, the system 
is no longer non-causal and can be treated as an initial- 
value problem. 

Thus if the loading conditions at the free end and the 
constitutive law are known, the rod model equations (|2.1[) . 
(|2.2p and the constitutive law (|2.3p can be solved using a 
differential algebraic equation (DAE) solver. To simplify 
the discussion, we assume that the constitutive law can 
be expressed in the following explicit form with no depen- 
dence on s 



'«(«) = 9iqis)J{s))- 



(2.4) 



Although there is loss of generality in this form and the 
constitutive law for DNA molecules is known to be base- 
pair sequence dependent and hence s-dcpendcnt, it allows 
us to simplify the discussion and implementation of subse- 
quent methods. We will discuss the more general implicit 
form (|2.3p and s dependence in Section [31 

Next, to cast the elastic rod model (|2.1[) . (j2.2p in the 
state-space form with the independent variable as s, we 



define the state vector as x{s) 
input vector as u{s) ~ k{s) = 



— to3 



We then write 



and the 



d 

ds 



ip{x{s 



qi(s) 
q2(s) 
q3(s) 

Us) 
Us) 

<i3{s)K.2{s) - q2(s)K3(s) + Us) 

-q3(s)Ki(s) -t- qi(s)K3(s) - fi(s) 
q2(s)K2(s) - qi(s)Ki(s) 

f3(s)K2(s) - Us)K3is) 
-Us)ki{s) + fi(s)K3(s) 
Us)K2is) - fi(s)/ti(s) 

u{s)), 



(2.5) 



where t/- : x M 
ment equation is 



A general form of the measure- 



y{s) = h{x,u), 



(2.6) 



where y{s) G M' represent the measured variables and 
/i : — > R' is the measurement function. The measure- 
ments y can either be obtained from experiments on DNA 
molecules, or from high-fidelity simulations of DNA such 
as molecular-dynamics simulations. 

We refer to (j2.5p as the rod model equations. Note 
that the unknown constitutive law (|2.4p relating k{s) with 
q{s) and f{s) now becomes 



u{s) = g{x{s)). 



(2.7) 



Figure[5]illustrates the relationship between the rod model, 
constitutive law and measurement function. Finally, note 
that if a subset of the states x{s) are measured, then y{s) 
can be written as 



y{s) = Cx{s), 



(2.8) 



where C G R " is a matrix with zeros and ones as its 
entries. 

The problem can then be stated as follows. 
Problem: Use the measurements y{s) along with known 
model equations (|2.5p to estimate the constitutive law (|2.7p . 

3. Constitutive-law Estimation 

First, we note that if measurements of u{s) and x{s) 
(that is, measurements of k{s), q{s), and /(s)) are avail- 
able, estimating the constitutive law becomes a static non- 
linear function estimation problem. 

When either u(s) or x{s) are unknown, estimating the 
constitutive law is treated as a two-step process. In step 
1 we estimate the unknown u(s) or x{s), then in step 2 
we use estimates obtained from step 1 to approximate the 
constitutive law using least-squares fitting. Specifically, 
we have the following four scenarios: 
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technique, we use least-squares fitting with the estimated 
u(s) and the measured/estimated x{s) to compute an es- 
timate of the constitutive law. Thus the methodology for 
estimating the constitutive law can be summarized as the 
following two-step procedure 

Step 1: Use input reconstruction (with simultaneous state 
estimation) to estimate u{s) (and x{s)) 

Step 2: Use least-squares fitting with estimates of u{s) and 
x{s) to approximate the constitutive law 



Figure 2: Block diagram representing the relationship between the 
rod model, constitutive law and experimental measurements. The 
red block denotes the unknown constitutive law, whose output can 
be treated as an unknown input to the rod model I I2.5II . 
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Scenario 1 


Measured 


Unknown 


Model simulation to estimate x{s) 


Scenario 2 


Measured 


Partially measured 


Kalman filtering to estimate x{s) 


Scenario 3 


Unknown 


Measured 


Input reconstruction to estimate w(s) 


Scenario 4 


Unknown 


Partially Measured 


Simultaneous state estimation 
and input reconstruction 


Impossible 


Unknown 


Unknown 





Table 1: Various scenarios of available measurements. 



1. Measurements of u(s) are available, but x{s) is un- 
known. 

2. Measurements of m(s) and y{s) are available, but the 
full state x{s) is unknown. 

3. Measurements of the full state x{s) are available, but 
u{s) is unknown. 

4. Measurements oiy{s) are available, but u{s) and the 
full state x{s) are unknown. 

The above scenarios are summarized in Table [1] 
In scenarios 1 and 2, since u{s) is known, we can run 
a numerical simulation using the rod model (j2.5[) to esti- 
mate x{s). In scenario 2 the estimate of x{s) can be fur- 
ther improved by using the measured y(s) with a nonlinear 
state-estimation algorithm such as the unscented Kalman 
filter [l^. In both scenarios, once estimates of x{s) are 
obtained, we use least-squares fitting using the measured 
u{s) and estimated x{s) to approximate the unknown con- 
stitutive law. An in-depth treatment of scenario 1 and 
further discussion on scenario 2 are presented in [ij. 

In the current paper, we focus on scenarios 3 and 4. For 
these scenarios, we develop a two-step technique for esti- 
mating the constitutive law. In both these scenarios, since 
the input u{s) is unknown, the first step of the two-step 
technique is an input-reconstruction problem. In scenario 
4, in addition to reconstructing inputs, we need to esti- 
mate the states. As discussed in the following subsection, 
the fact that w(s) is not an external input but the effect 
of an internal feedback nonlinearity as seen in Figure [2] 
does not affect the input reconstruction. Once estimates 
of m(s) and x{s) are available, in step two of the two-step 



3.1. Step 1: Input Reconstruction 

In step 1 of the estimation technique, the objective is to 
estimate states x{s) and unknown inputs m(s), given mea- 
surements of the output y{s). First, we briefly summarize 
results from [l^, H^] for simultaneous input reconstruction 
and state estimation for a linear discrete system. 

Consider the linear discrete state-space system 

Xk+i = AkXk + BkUk + Wk, (3.1) 
yk = CkXk + Vk. (3.2) 

where Xk G M", yk G M', Uk G R", Ak G M"""", Ck G 
R'^", and Bk G M"^™. We assume that Ak, Bk, and Ck 
are known, while Uk is unknown. Wk G and Vk G 
are unknown Gaussian white noise sequences with known 
covariances Qk and Rk respectively. Without loss of gen- 
erality, we assume that rank(i?fc) — m for some k. Finally, 
we note that Uk is arbitrary and can either be determin- 
istic or stochastic external drivers or be an internal signal 
such nonlinear function of the states. 
We consider a filter of the form 

ifc+i|fc+i = Xk+iik + Lk+i{yk+i - Ck+iXk+i\k), (3.3) 
ik+i\k ^ AkXk\k- (3.4) 

Note that since Uk is unknown, the term BkUk is absent 
in (lO)) . 

The state estimation error is 

Ek = Xk+1 - Xk+i\k+i, (3.5) 



and the error eovariance matrix is defined as 

-ffc+i|fc+i = E [sk+iel^j] , (3.6) 

where E is the expected value. The filter is unbiased if and 
only if 

E[xk+i - Xk+i\k+i] = 0, (3.7) 
or consequently 

E[Akek + BkUk + Wk - Lk+iiCk+iAkEk 

+ Ck+iBkUk + Ck+iwk + vk+i)] = 0. (3.8) 

Since Uk is arbitrary, (|3.8|) implies 

{I-Lk+iCk+i)Bk=Q. (3.9) 

Next, we define the cost function J as the trace of the 
error eovariance matrix 

J{Lk+i) = trE[£fc+i£T^i] =trPfc+i|fc+i. (3.10) 

Theorem 3.1. The unbiased minimum-variance gain 
Lk+i in the filter iS. 3\) that minimizes the cost function 
LS.10\) subject to the constraint \3. 9j) is given by 

Lk+i = BkUk + Fk+iRkli{I - Vk+ili-k). (3.11) 

where 

Uk = {V^+,R^l,Vk+i)-'v;^+,Rkl^, (3.12) 

Rk+i ~ Ck+iPk+i\kCk+i (3.13) 

Fk+i=Pk+i\kCl+i, (3.14) 

Vk+i = Ck+iBk. (3.15) 
Furthermore, the eovariance update equation is 

Pk+l\k+l = Pk+l\k - Pk+lRkllPk+l + 

(Bk - Fk+iR'^l^Vk+i)iVk+iRf:lj^Vk+i)~^ 
X {Bk - Fk+iRkliVk+if, (3.16) 

Pk+i\k = AkPkikAl + Qk- (3.17) 

It is straightforward to check that Lk+i given by (j3.1ip 
satisfies the constraint (|3.9p . Furthermore, in the absence 
of unknown inputs, the traditional Kalman filter gain is 
obtained by setting Bk = in the optimal filter gain Lk+i 
given by p. 111) . 

So far, we discussed unbiased estimation of the state Xk 
in the presence of arbitrary unknown inputs Uk- Next, we 
discuss how the unknown inputs Uk are estimated, using 
the unbiased estimates Xk\k of the states Xk- 

Proposition 3.1. Suppose that Xk\k is an unbiased 



estimate of the states Xk of p.ip . Then 

ilk = BlLk+iivk+i - Ck+iXk+i\k), (3.18) 
is an unbiased estimate of Uk- 

Proof. Since I > p, we can define Uk as 

Ufc = BlLk+iivk+i - Ck+iXk+i\k), (3.19) 

where f denotes the Moore-Penrose generalized inverse. 
Next, we use ([331) and ([339)1 to get 

ilk = Bl{xk+i\k+i - Xk+i\k) 

= Bl{xk+i + Ek+i - AkXkik) 

= Bl{xk+i - AkXk + Sk+i - AkEk) 

= BliBkUk + Wk+Ek+i-AkSk). (3.20) 

Further, taking expected value on both sides of p.20p . 
yields 

E[uk] = E[Bl{BkUk + Wk+ Ek+i - AkEk)], 

(3.21) 

Finally, noting that E[ek] = and the fact that Wk is zero- 
mean, we get 

E[uk]= BlBkE[uk]=E[uk]. a 

Thus by using Theorem 13.11 and Proposition 13.11 we 
can estimate states and simultaneously reconstruct un- 
known inputs. The above development focusses on the 
linear case. The unscented unbiased minimum- variance 
(UUMV) filter, a nonlinear extension of the unbiased 
minimum- variance filter that uses the same expression for 
the gain matrix p. lip and the unscented transform to 
compute the error eovariance matrix Pk^i\k+i is discussed 
in the appendix. 

It is worthwhile to note that since there exists a k such 
that rank(i3fc) = m, it follows from p. 91) that there exists 
k such that Ta.nk{Ck+iBk) = m and therefore I > m. Fi- 
nally, since the above development is in discrete space, we 
use Euler discretization with a small step size to discretize 
the state-space equations (j2.5p . 

3.2. Step 2: Least-squares Function Approximation 

Once estimates of both states x{s) and inputs u{s) in 
(|2.5p are obtained, in step 2, least-squares functions ap- 
proximation tools are used to estimate the constitutive 
law. In this case, a variety of techniques can be applied 
to estimate the constitutive law. To use a standard least- 
squares function approximation technique, we assume that 
g can be expressed as a basis-function expansion 

p 

u,is)=J2^^jMx{s)), J = l, 2, 3, (3.22) 

i=l 



where (/j^ : M° — M arc the basis functions, w^j- are the 
unknown coefficient of the basis function expansion, and 
p is the number of basis functions chosen. Since u{s) and 
x{s) are known, and (pi are user-chosen, the unknown coef- 
ficients J- are then determined by standard least-squares 
fitting. The least-squares solution for the unknown coefR- 
cients is 



(3.23) 



where = [ • • • ] is the coefficient vector, U 
[ m(0) • • • u{L) ] and 



Mm) ••• M<L)) 

0p(a;(O)) ■•• 4>p{x{L)) 



(3.24) 



4. Decoupled One-dimensional Problem 

We start with the simplest case in scenario 3 considered 
in 0. 

In this section, we first make the following assumptions. 

Al The material behavior of DNA molecules is decou- 
pled in the principal directions of bending and tor- 
sion and is internal- force independent. 

A2 The measured quantities are available for all values 
of s. 

A3 Measurements are noise-free. 

These assumptions will be relaxed in subsequent sec- 
tions. 

In view of assumption Al, by choosing the body-fixed 
frame along these principal directions, the vector consti- 
tutive law (|2.7p is decoupled into the following scalar con- 
stitutive law equations 



ui{s) = .gi(xi(s)), 
U2{s) = g2{x2{s)), 
usis) = g3{x3{s)). 



(4.1) 
(4.2) 
(4.3) 



Let the first two axes in the body-fixed frame ai(s) and 
02(5) correspond to the principal bending axes and the 
third axis 03(3) correspond to the principal torsion axis. 
Now, if the applied shear force is acting along axis ai, then 
the bio- filament bends in-plane about axis 02- Thus the 
first and third components of k{s) and q{s) along with the 
second component of /(s) are zero. Thus the rod model 
reduces to the one-dimensional rod model equations 



dq2 f , . 

= -h{s)'i2{s), 

^ = fl(s)K2(s), 



(4.4) 
(4.5) 
(4.6) 




0.2 0.4 0.6 O.i 

length s (nondimensional) 

Figure 3: Actual and estimated deformation K2{s) for a cantilever 
DNA molecule with in-plane deformations. 



with the constitutive-law equation 
plified case 



For this sim- 



defining the state vector as x^-^{s) = 

[ 92(5) /i(s) fais) ]^ eM.^ and the input as u^°{s) = 
^2(5) 6 IS, the one-dimensional rod model equations can 
be written in the state-space form as 



„1-D 



ds 



-ID 



ID/ 



-4^(s)u^^(s) 
4^(s)ui^(s) 



(4.7) 



To illustrate the estimation procedure, we choose g2{-) 
to be both an arctangent function and saturation func- 
tion and set the loading at the free end to be x{0) = 
[1 2 ] , where all numbers are dimensionless. 

Using measurements of /(s) and q{s), in step 1, we use 
the UUMV filter with s as the independent variable, to 
estimate ^2(5). Figure [3] shows estimate of ^2(3) using the 
UUMV filter. Note that the unknown loading conditions 
at the free end (unknown initial conditions) have limited 
detrimental effect on the estimate. 

Once ^2(5) is estimated, in step 2, the function g2{-) is 
estimated by representing the unknown function as an ex- 
pansion of sinusoidal basis functions and using a standard 
least-squares to fit the unknown coefficients of the basis 
function expansion as in (|3.23|) . Figure |4] shows the actual 
and estimate of the constitutive law 52 (Oi when an arctan- 
gent function is used for simulations. Figure [5] shows the 
actual and estimate of the constitutive law g2{-), when a 
saturation function is used for simulations. 

Finally, we note that by running three separate exper- 
iments with excitation along one principal axis at a time, 
we can use the same procedure discussed above to esti- 
mate all three constitutive laws (|4.ip - (|4.3p . and thus the 
complete nonlinear constitutive law. The estimated con- 
stitutive law can then be used to predict deformations for 
any general loading configuration. 
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Figure 4: Actual and estimated arctangent constitutive law for a 
DNA Molecule with the in-plane cantilever deformation. 
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Figure 5: Actual and estimated saturation constitutive law for a 
cantilever DNA molecule with in-plane deformation. 



to be a linear function and two arctangent functions, re- 
spectively. Figures ini [3 and H] show estimates of the three 
components of the constitutive law obtained as described 
above. 
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Figure 6: Actual and estimated linear constitutive law for a cantilever 
DNA molecule with three-dimensional deformation. 
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5. Decoupled Three-dimensional Problem 

As discussed in the previous section, when the consti- 
tutive law is decoupled, the complete constitutive law can 
be estimated from three separate experiments focussing on 
one axis at a time. In this section, we consider the esti- 
mation of the complete decoupled constitutive law from 
a single experiment by choosing the loading conditions at 
the free end appropriately. 

For step 1 of the two-step estimation technique, by 
using complete state-space equations (j2.5p . we use the 
UUMV filter described in the appendix to estimate the un- 
known three-dimensional curvature vector u{s) = k{s). By 
using the estimated curvature vector k(s) and the known 
internal force vector ^(s), we use standard least squares 
fitting to estimate the coefficients of the basis function ap- 
proximation p.23p . 

The loading condition at the free end is chosen to 
be a;(0) = [ 2 -1 -1 -1 -5 ] , while the three 
components of the decoupled constitutive law are chosen 



Figure 7: Actual and estimated arctangent constitutive law for a 
cantilever DNA molecule with three-dimensional deformation. 

Note that the unknown-input matrix used in the UUMV 
filter is state-dependent and thus varying with the inde- 
pendent variable s 
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In the implementation of the UUMV filter, the state es- 
timates are used to construct this state-dependent i?(s) 
matrix. 
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Figure 8: Actual and estimated arctangent constitutive law for a 
cantilever DNA molecule with three-dimensional deformation. 

6. Coupled Three-dimensional Problem 

In this section, we relax assumption Al. That is, we 
let the constitutive law be coupled, and assume a more 
general form of constitutive law written as 

Kiis)^gMs)J{s)). (6.1) 
«2(s) =52(9(5), /(s)), (6.2) 
«3(s)=. 93(9(5), /(s)). (6.3) 

In principle, the coupled constitutive law does not af- 
fect the first step in which the unknown curvature vector 
u(s) = k{s) is estimated using the UUMV filter. However, 
the second step in which the estimated input k{s) and the 
known state are used to estimate a coupled functional rela- 
tionship of the form (j6.ip - (j6.3p using a basis-function fit 
(|3.23p . differs in two ways. First, to estimate a coupled re- 
lationship, multivariable basis functions have to be chosen 
for the basis- function expansion in p.22p . Here, we choose 
thin-plate-spline radial basis functions that have the form 

M^is)) = ||a;(s)-c,f log||x(,s)-c,||, (6.4) 

where Ci are centers of the radial basis functions and are 
chosen by the user. Second, from a practical point of view, 
a single experiment only provides data for a single curve on 
a multi-dimensional constitutive-law surface. Thus a sin- 
gle experiment does not provide enough data to estimate 
the coupled constitutive law. To deal with this issue, we 
used an ensemble of 50 experiments with different loading 
conditions, and thus generating data that represents 50 
different curves on the multi-dimensional constitutive-law 
surface. 

By using an ensemble of experiments, we use the UUMV 
filter to generate estimates of the curvature vector k(s) 
for all 50 experiments. Using this ensemble of estimates 
of k{s) and known q{s) and /(s), a single least-squares 
problem for estimating the coefficients of the basis func- 
tion expansion p.23p is solved. Note that since the UUMV 



filter is robust to unknown initial conditions, the loading 
conditions for the 50 different experiments need not be 
known. 

For demonstrating the above technique, we use the 2- 
D coupled constitutive law ^3(5) = tan^^ (5/3(s)(73(s)). 
Figure [9] shows the second component of the actual consti- 
tutive law, while Figure ITOl shows it's estimate using the 
above-described technique. 

Finally, Figure [TT] shows a validation check using a 
comparison of the results of the elastic rod model simu- 
lations with the actual constitutive law and with the es- 
timated constitutive law for a set of independent loading 
conditions that were not used to estimate the constitutive 
law. 




Figure 9: Actual coupled nonlinear constitutive law for a cantilever 
DNA molecule. 




Figure 10: Estimate of the coupled nonlinear constitutive law for a 
cantilever DNA molecule with three-dimensional deformation. 



7. Robustness 

Next, we relax assumptions A2 and A3, and thus study 
the robustness of the algorithm under sparse and/or noisy 
measurements. For this analysis, we use the decoupled 
one-dimensional problem discussed in section |4] for conve- 
nience. We note that the analysis provided here is also ap- 
plicable to the decoupled three-dimensional problem and 
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Figure 11: Comparison of elastic rod model simulations with the 
true constitutive law and with the estimated constitutive law for an 
independent set of loading conditions. 
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Figure 12: MSE as a function of noise amplitude with no downsam- 
pling. 



the coupled three-dimensional problem since the underly- 
ing algorithms are the same. To assess the quality of the 
estimate, we use the normalized mean square error (MSE) 
defined as 



3r downsampling ol 1000/E9 



MSE : 



1 ^ 



- Ks^)) 



(7.1) 



where k represents the estimate of k, Si represents the mea- 
surement locations, and N is the total number of interior 
measurement locations. 

Figure [T^ represent the MSE as a function of increasing 
noise amplitude Vk- A total of 1000 interior measurement 
points are used for each simulation in the plot. On the 
other hand. Figure [T^ represent the MSE as a function of 
increasing noise amplitude with only 30 interior measure- 
ment points. As expected, the MSE for low noise ampli- 
tudes in Figure [12] is lower than in Figure [T31 However, the 
differences at high noise amplitude are smaller, indicating 
the beyond a noise amplitude of 10~^, the detrimental ef- 
fect of the noise is dominant over the detrimental effect 
due to sparse data. Furthermore, in both plots, the MSE 
is flat for a significant portion of the low noise amplitude 
range. Suggesting that the algorithms are insensitive to 
noise at these low amplitudes. 

Figure [T3] represent the MSE as a function of decreas- 
ing interior measurement points. The measurements were 
assumed to be noise-free. This plot suggests that there 
exists a threshold beyond which, sparser data do not dete- 
riorate the estimates any further. Finally, Figure [14] also 
seems to confirm the belief that more data is always benefi- 
cial. Figure [15] represent the MSE as a function of decreas- 
ing interior measurement points, a gaussian white noise of 
standard deviation 0.01 was used to corrupt all measured 
variables. Again, the flat nature of this plot confirms that 
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Figure 13: MSE as a function of noise amplitude with using 30 points 
out of 1000. 
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Figure 14: MSE as a function of downsampling for zero noise. 
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Figure 15: MSE as a function of downsampling for noise amplitude 
0.01. 



Finally, Figure [1^] shows MSE as a function of increas- 
ing noise amplitude and decreasing number of interior mea- 
surement points. The effect due to noise amplitude and 
sparse measurements seem to be clearly decoupled with 
acceptable accuracy with a region with noise standard de- 
viation 0.01 or less and interior measurement points of 30 
or more. 
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Figure 16: MSE as a function of noise and downsampling. 



8. Partial State Measurements 

Finally, we consider the cases in which the output mea- 
surements y{s) are a subset of the states x{s). As noted 
earlier, in these cases, the measurement equation becomes 



y{s) = Cx{s). 



(8.1) 



Since, we need Z > m for (|3.9p to be satisfied, we ex- 
amine the most difficult scenario I — ni — 3. We consider 



the following representative cases for C for Z = 3. 

C^[h 03x3 ] , 
C = [ 0,3X3 h ] , 



C 



1 
1 
1 



(8.2) 
(8.3) 

(8.4) 



As discussed in Section 13.11 a necessary condition 
for unbiased input reconstruction is that for s such that 
rank(B(s)) = m, rank(CB(s)) = dim(M(s)) = 3. It fol- 
lows that C in (j8.2p and (|8.3p do not satisfy this condition, 
while C in (|8.4p satisfies this necessary condition. There- 
fore, it follows that measurements of either /(s) or q(s) 
alone does not help determine the constitutive law. Thus 
experimentalists will have to design experiments in such 
a way that a combination of components of /(s) and g(s) 
are measured. 

9. Discussion 

Although the results in this paper show promise, sev- 
eral practical and theoretical issues remain. Ongoing and 
future work includes estimating an implicit constitutive 
law that is dependent on s. For both a implicit consti- 
tutive law and s-dependent constitutive law, the input 
reconstruction step to estimate k{s) remains unchanged. 
However, once estimates k{s) are obtained, a more sophis- 
ticated function approximation technique must be used for 
the second step. Moreover, for an implicit constitutive law, 
the implementation of the final elastic rod model equations 
with the estimated constitutive law must be using a DAE 
solver. 

Furthermore, in short length scales, the disturbances 
experienced by DNA molecules due to thermal fluctuations 
of the surrounding aqueous medium and hydrodynamic 
forces often overwhelm deterministic forces and loading 
conditions applied to DNA molecules. Thus, the estima- 
tion algorithms must be robust for state disturbance Wk 
of magnitude larger than the deterministic deformations. 

Finally, measurements in current DNA experiments are 
bulk properties of the molecules such as the elastic strain 
energy which can be expressed as 



U = / qis) ■ dk ds. 

Jo Jnois) 



(9.1) 



Note that (j9.ip is a function of the states at several values 
of the independent variable s and does not readily fit into 
the Kalman filtering and UUMV filtering framework. Both 
these significant challenges will be addressed as part of 
future work. 

10. Conclusion 



We developed a two-step technique to use elastic rod 
model equations in combination with limited experimental 



measurements or high-fidelity molecular dynamics simula- 
tion data to estimate the nonlinear constitutive law gov- 
erning DNA molecules. We first cast the elastic rod model 
equations in state-space form and expressed the effect of 
the unknown constitutive law as an unknown input to the 
system. Then, in step 1, we used input reconstruction 
techniques to estimate unmeasured states and unknown in- 
puts of rod model equations. In step 2, estimates from step 
1 were used in least-squares function fitting to approximate 
the constitutive law. Various simplification and scenarios 
with decoupled constitutive laws and coupled constitutive 
laws were discussed. We investigated the robustness of the 
two-step technique through simulations and made several 
observations. We finished with some concluding discussion 
about future work. 
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Appendix 



with weights 



State Estimation for Nonlinear Systems 

Consider the nonlinear stochastic discrete-time dy- 
namic system 



Xk+l 

Vk 



Ip {xk,Uk,Wk) , 

h{xk) + Vk, 



(10.1) 
(10.2) 



where ?/> : M" xM" xR"? ^ M" and /i : M" M' are, respec- 
tively, the process and observation models. The optimal 
solution to the state-estimation problem is complicated Q 
by the fact that, for nonlinear systems, p{xk\(yi-, ■ ■ ■ , yk)) is 
not completely characterized by its first and second-order 
moments. We thus use an approximation based on the 
classical Kalman filter to provide a suboptimal solution to 
the nonlinear case. 



(m) 

7o 
7o 



A 



(c) (m) (c) 

i = l,...,na. 



2(na + A)' 



where col^ [(•)"'^^^] is the iih column of the Cholcsky square 
root, < Q < 1, /3 > 0, 6* > 0, and A = 02(6* na) - n^. 
We set a = 1 and 6* = El such that A = El and set 



/3 = 2 jl4| . Alternative schemes for choosing sigma points 
are given in [lij . 

The UKF forecast equations are given by 



Unscented Kalman Filter 

First, for nonlinear systems, we consider the unscented 
Kalman filter (UKF) [16| to provide a suboptimal solution 
to the state-estimation problem. Instead of analytically 
linearizing the nonlinear state-space model and using lin- 
ear filter equations, UKF employs the unscented transform 
(UT) [l3l , which approximates the posterior mean y £ 
and covariance P^^ g R'^' of a random vector y obtained 
from the nonlinear transformation y = h{x), where x is a 
prior random vector whose mean x e R" and covariance 
pxx g are assumed known. UT yields the actual 

mean y and the actual covariance P^^ if /i = hi + h2, 
where hi is linear and /12 is quadratic [17[. Otherwise, yk 
is a pseudo mean and P^^ is a pseudo covariance. 

UT is based on a set of deterministically chosen vectors 
known as sigma points. To capture the mean of the 
augmented prior state vector 



Xk 
Wk 



(10.3) 



where € R"'' and ria — « + as well as the augmented 
prior error covariance 



^k — 



pxx n 
Oo 



Jq>(.n ^k 

the sigma-point matrix Xk & ]]j"ax(2"a+i) jg chosen as 



(10.4) 



colo(Xfc) 

C0li(XA;) = x% 



+ y/{n, + X)cok (Pr") 



-.1/2 



i = 1 



: ■ • ■ ; "-a; 



coli+„^(Xfc) =x% 

- v/K + A)coi, [(pr")'^' 

i = 1, . . . ,na. 



(10.5) 



col,(X^+i|,) = ^(col,(XD, "fc, col,(X;i')), z = 0, . . . , 2na, 

(10.6) 

5]7l'"^col,(X-+,|,), (10.7) 



Xk+l\k 



pxx 
^k+l\k 



i=0 
2n. 



|/c) - Xk+l\k\ 



1=0 



(10.8) 

col.(yfe+i|fe) = /i(col,(X^|fe_i)), * = 0,...,2na, (10.9) 

2ria 

5^7|'"^col,(y,+i|,), (10.10) 



yk+i\k 



i=Q 
2n. 



pyy 

^k+l\k 



where 



pxy 



■^k 

X^ 



^7,''''[coli(yA.+i|A.) - yfe+i|fe][coli(yfc+in,) - yk+i\kf + Rk, 

(10.11) 

2na 

^[coli(X^_^ijfc) - ifc+i|fc][col,(yfe+i|fe) - Vk+iikf, 

i=Q 

(10.12) 

Xfe, X?: G R"x(2"a+i), andX'^ G K9x(2»a+i), 



Unbiased Minimum-variance Unscented Filter 

Next, for nonlinear systems with unknown inputs, we 
consider an extension of the UKF along the hnes of the hn- 
ear unbiased minimum- variance filter. Thus, to obtain the 
pseudo mean and the pseudo error covariances we use the 
unscented transform, and to estimate the states and un- 
known inputs, we use the expressions derived for the unbi- 
ased minimum- variance filter. Thus, the forecast equations 
for the unbiased minimum-variance unscented (UMVU) 
filter are given by (jlO.Sp - (|10.12p . The data-assimilation 
equations for the UMVU fihcr arc given by 

Xk+i\k+i = ^'fe+i|fc + +Lk+iiyk+i - /i(ife+i|fe)), (10.13) 
Xk+i\k=^H£k\k,0,0). (10.14) 
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